{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Project 1: Trading with Momentum\n",
    "## Instructions\n",
    "Each problem consists of a function to implement and instructions on how to implement the function.  The parts of the function that need to be implemented are marked with a `# TODO` comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our `project_tests` package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it to Udacity.\n",
    "\n",
    "## Packages\n",
    "When you implement the functions, you'll only need to you use the packages you've used in the classroom, like [Pandas](https://pandas.pydata.org/) and [Numpy](http://www.numpy.org/). These packages will be imported for you. We recommend you don't add any import statements, otherwise the grader might not be able to run your code.\n",
    "\n",
    "The other packages that we're importing are `helper`, `project_helper`, and `project_tests`. These are custom packages built to help you solve the problems.  The `helper` and `project_helper` module contains utility functions and graph functions. The `project_tests` contains the unit tests for all the problems.\n",
    "\n",
    "### Install Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import helper\n",
    "import project_helper\n",
    "import project_tests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Market Data\n",
    "### Load Data\n",
    "The data we use for most of the projects is end of day data. This contains data for many stocks, but we'll be looking at stocks in the S&P 500. We also made things a little easier to run by narrowing down our range of time period instead of using all of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('../../data/project_1/eod-quotemedia.csv', parse_dates=['date'], index_col=False)\n",
    "\n",
    "close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')\n",
    "\n",
    "print('Loaded Data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "Run the cell below to see what the data looks like for `close`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "project_helper.print_dataframe(close)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stock Example\n",
    "Let's see what a single stock looks like from the closing prices. For this example and future display examples in this project, we'll use Apple's stock (AAPL). If we tried to graph all the stocks, it would be too much information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "apple_ticker = 'AAPL'\n",
    "project_helper.plot_stock(close[apple_ticker], '{} Stock'.format(apple_ticker))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resample Adjusted Prices\n",
    "\n",
    "The trading signal you'll develop in this project does not need to be based on daily prices, for instance, you can use month-end prices to perform trading once a month. To do this, you must first resample the daily adjusted closing prices into monthly buckets, and select the last observation of each month.\n",
    "\n",
    "Implement the `resample_prices` to resample `close_prices` at the sampling frequency of `freq`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def resample_prices(close_prices, freq='M'):\n",
    "    \"\"\"\n",
    "    Resample close prices for each ticker at specified frequency.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    close_prices : DataFrame\n",
    "        Close prices for each ticker and date\n",
    "    freq : str\n",
    "        What frequency to sample at\n",
    "        For valid freq choices, see http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    prices_resampled : DataFrame\n",
    "        Resampled prices for each ticker and date\n",
    "    \"\"\"\n",
    "    # TODO: Implement Function\n",
    "    \n",
    "    return None\n",
    "\n",
    "project_tests.test_resample_prices(resample_prices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "Let's apply this function to `close` and view the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "monthly_close = resample_prices(close)\n",
    "project_helper.plot_resampled_prices(\n",
    "    monthly_close.loc[:, apple_ticker],\n",
    "    close.loc[:, apple_ticker],\n",
    "    '{} Stock - Close Vs Monthly Close'.format(apple_ticker))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compute Log Returns\n",
    "\n",
    "Compute log returns ($R_t$) from prices ($P_t$) as your primary momentum indicator:\n",
    "\n",
    "$$R_t = log_e(P_t) - log_e(P_{t-1})$$\n",
    "\n",
    "Implement the `compute_log_returns` function below, such that it accepts a dataframe (like one returned by `resample_prices`), and produces a similar dataframe of log returns. Use Numpy's [log function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html) to help you calculate the log returns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def compute_log_returns(prices):\n",
    "    \"\"\"\n",
    "    Compute log returns for each ticker.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    prices : DataFrame\n",
    "        Prices for each ticker and date\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    log_returns : DataFrame\n",
    "        Log returns for each ticker and date\n",
    "    \"\"\"\n",
    "    # TODO: Implement Function\n",
    "    \n",
    "    return None\n",
    "\n",
    "project_tests.test_compute_log_returns(compute_log_returns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "Using the same data returned from `resample_prices`, we'll generate the log returns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "monthly_close_returns = compute_log_returns(monthly_close)\n",
    "project_helper.plot_returns(\n",
    "    monthly_close_returns.loc[:, apple_ticker],\n",
    "    'Log Returns of {} Stock (Monthly)'.format(apple_ticker))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Shift Returns\n",
    "Implement the `shift_returns` function to shift the log returns to the previous or future returns in the time series. For example, the parameter `shift_n` is 2 and `returns` is the following:\n",
    "\n",
    "```\n",
    "                           Returns\n",
    "               A         B         C         D\n",
    "2013-07-08     0.015     0.082     0.096     0.020     ...\n",
    "2013-07-09     0.037     0.095     0.027     0.063     ...\n",
    "2013-07-10     0.094     0.001     0.093     0.019     ...\n",
    "2013-07-11     0.092     0.057     0.069     0.087     ...\n",
    "...            ...       ...       ...       ...\n",
    "```\n",
    "\n",
    "the output of the `shift_returns` function would be:\n",
    "```\n",
    "                        Shift Returns\n",
    "               A         B         C         D\n",
    "2013-07-08     NaN       NaN       NaN       NaN       ...\n",
    "2013-07-09     NaN       NaN       NaN       NaN       ...\n",
    "2013-07-10     0.015     0.082     0.096     0.020     ...\n",
    "2013-07-11     0.037     0.095     0.027     0.063     ...\n",
    "...            ...       ...       ...       ...\n",
    "```\n",
    "Using the same `returns` data as above, the `shift_returns` function should generate the following with `shift_n` as -2:\n",
    "```\n",
    "                        Shift Returns\n",
    "               A         B         C         D\n",
    "2013-07-08     0.094     0.001     0.093     0.019     ...\n",
    "2013-07-09     0.092     0.057     0.069     0.087     ...\n",
    "...            ...       ...       ...       ...       ...\n",
    "...            ...       ...       ...       ...       ...\n",
    "...            NaN       NaN       NaN       NaN       ...\n",
    "...            NaN       NaN       NaN       NaN       ...\n",
    "```\n",
    "_Note: The \"...\" represents data points we're not showing._"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def shift_returns(returns, shift_n):\n",
    "    \"\"\"\n",
    "    Generate shifted returns\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    returns : DataFrame\n",
    "        Returns for each ticker and date\n",
    "    shift_n : int\n",
    "        Number of periods to move, can be positive or negative\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    shifted_returns : DataFrame\n",
    "        Shifted returns for each ticker and date\n",
    "    \"\"\"\n",
    "    # TODO: Implement Function\n",
    "    \n",
    "    return None\n",
    "\n",
    "project_tests.test_shift_returns(shift_returns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "Let's get the previous month's and next month's returns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "prev_returns = shift_returns(monthly_close_returns, 1)\n",
    "lookahead_returns = shift_returns(monthly_close_returns, -1)\n",
    "\n",
    "project_helper.plot_shifted_returns(\n",
    "    prev_returns.loc[:, apple_ticker],\n",
    "    monthly_close_returns.loc[:, apple_ticker],\n",
    "    'Previous Returns of {} Stock'.format(apple_ticker))\n",
    "project_helper.plot_shifted_returns(\n",
    "    lookahead_returns.loc[:, apple_ticker],\n",
    "    monthly_close_returns.loc[:, apple_ticker],\n",
    "    'Lookahead Returns of {} Stock'.format(apple_ticker))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Trading Signal\n",
    "\n",
    "A trading signal is a sequence of trading actions, or results that can be used to take trading actions. A common form is to produce a \"long\" and \"short\" portfolio of stocks on each date (e.g. end of each month, or whatever frequency you desire to trade at). This signal can be interpreted as rebalancing your portfolio on each of those dates, entering long (\"buy\") and short (\"sell\") positions as indicated.\n",
    "\n",
    "Here's a strategy that we will try:\n",
    "> For each month-end observation period, rank the stocks by _previous_ returns, from the highest to the lowest. Select the top performing stocks for the long portfolio, and the bottom performing stocks for the short portfolio.\n",
    "\n",
    "Implement the `get_top_n` function to get the top performing stock for each month. Get the top performing stocks from `prev_returns` by assigning them a value of 1. For all other stocks, give them a value of 0. For example, using the following `prev_returns`:\n",
    "\n",
    "```\n",
    "                                     Previous Returns\n",
    "               A         B         C         D         E         F         G\n",
    "2013-07-08     0.015     0.082     0.096     0.020     0.075     0.043     0.074\n",
    "2013-07-09     0.037     0.095     0.027     0.063     0.024     0.086     0.025\n",
    "...            ...       ...       ...       ...       ...       ...       ...\n",
    "```\n",
    "\n",
    "The function `get_top_n` with `top_n` set to 3 should return the following:\n",
    "```\n",
    "                                     Previous Returns\n",
    "               A         B         C         D         E         F         G\n",
    "2013-07-08     0         1         1         0         1         0         0\n",
    "2013-07-09     0         1         0         1         0         1         0\n",
    "...            ...       ...       ...       ...       ...       ...       ...\n",
    "```\n",
    "*Note: You may have to use Panda's [`DataFrame.iterrows`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.iterrows.html) with [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) in order to implement the function. This is one of those cases where creating a vecorization solution is too difficult.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def get_top_n(prev_returns, top_n):\n",
    "    \"\"\"\n",
    "    Select the top performing stocks\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    prev_returns : DataFrame\n",
    "        Previous shifted returns for each ticker and date\n",
    "    top_n : int\n",
    "        The number of top performing stocks to get\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    top_stocks : DataFrame\n",
    "        Top stocks for each ticker and date marked with a 1\n",
    "    \"\"\"\n",
    "    # TODO: Implement Function\n",
    "    \n",
    "    return None\n",
    "\n",
    "project_tests.test_get_top_n(get_top_n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "We want to get the best performing and worst performing stocks. To get the best performing stocks, we'll use the `get_top_n` function. To get the worst performing stocks, we'll also use the `get_top_n` function. However, we pass in `-1*prev_returns` instead of just `prev_returns`. Multiplying by negative one will flip all the positive returns to negative and negative returns to positive. Thus, it will return the worst performing stocks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "top_bottom_n = 50\n",
    "df_long = get_top_n(prev_returns, top_bottom_n)\n",
    "df_short = get_top_n(-1*prev_returns, top_bottom_n)\n",
    "project_helper.print_top(df_long, 'Longed Stocks')\n",
    "project_helper.print_top(df_short, 'Shorted Stocks')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Projected Returns\n",
    "It's now time to check if your trading signal has the potential to become profitable!\n",
    "\n",
    "We'll start by computing the net returns this portfolio would return. For simplicity, we'll assume every stock gets an equal dollar amount of investment. This makes it easier to compute a portfolio's returns as the simple arithmetic average of the individual stock returns.\n",
    "\n",
    "Implement the `portfolio_returns` function to compute the expected portfolio returns. Using `df_long` to indicate which stocks to long and `df_short` to indicate which stocks to short, calculate the returns using `lookahead_returns`. To help with calculation, we've provided you with `n_stocks` as the number of stocks we're investing in a single period."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def portfolio_returns(df_long, df_short, lookahead_returns, n_stocks):\n",
    "    \"\"\"\n",
    "    Compute expected returns for the portfolio, assuming equal investment in each long/short stock.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    df_long : DataFrame\n",
    "        Top stocks for each ticker and date marked with a 1\n",
    "    df_short : DataFrame\n",
    "        Bottom stocks for each ticker and date marked with a 1\n",
    "    lookahead_returns : DataFrame\n",
    "        Lookahead returns for each ticker and date\n",
    "    n_stocks: int\n",
    "        The number number of stocks chosen for each month\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    portfolio_returns : DataFrame\n",
    "        Expected portfolio returns for each ticker and date\n",
    "    \"\"\"\n",
    "    # TODO: Implement Function\n",
    "    \n",
    "    return None\n",
    "\n",
    "project_tests.test_portfolio_returns(portfolio_returns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "Time to see how the portfolio did."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "expected_portfolio_returns = portfolio_returns(df_long, df_short, lookahead_returns, 2*top_bottom_n)\n",
    "project_helper.plot_returns(expected_portfolio_returns.T.sum(), 'Portfolio Returns')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statistical Tests\n",
    "### Annualized Rate of Return"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "expected_portfolio_returns_by_date = expected_portfolio_returns.T.sum().dropna()\n",
    "portfolio_ret_mean = expected_portfolio_returns_by_date.mean()\n",
    "portfolio_ret_ste = expected_portfolio_returns_by_date.sem()\n",
    "portfolio_ret_annual_rate = (np.exp(portfolio_ret_mean * 12) - 1) * 100\n",
    "\n",
    "print(\"\"\"\n",
    "Mean:                       {:.6f}\n",
    "Standard Error:             {:.6f}\n",
    "Annualized Rate of Return:  {:.2f}%\n",
    "\"\"\".format(portfolio_ret_mean, portfolio_ret_ste, portfolio_ret_annual_rate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The annualized rate of return allows you to compare the rate of return from this strategy to other quoted rates of return, which are usually quoted on an annual basis. \n",
    "\n",
    "### T-Test\n",
    "Our null hypothesis ($H_0$) is that the actual mean return from the signal is zero. We'll perform a one-sample, one-sided t-test on the observed mean return, to see if we can reject $H_0$.\n",
    "\n",
    "We'll need to first compute the t-statistic, and then find its corresponding p-value. The p-value will indicate the probability of observing a t-statistic equally or more extreme than the one we observed if the null hypothesis were true. A small p-value means that the chance of observing the t-statistic we observed under the null hypothesis is small, and thus casts doubt on the null hypothesis. It's good practice to set a desired level of significance or alpha ($\\alpha$) _before_ computing the p-value, and then reject the null hypothesis if $p < \\alpha$.\n",
    "\n",
    "For this project, we'll use $\\alpha = 0.05$, since it's a common value to use.\n",
    "\n",
    "Implement the `analyze_alpha` function to perform a t-test on the sample of portfolio returns. We've imported the `scipy.stats` module for you to perform the t-test.\n",
    "\n",
    "Note: [`scipy.stats.ttest_1samp`](https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.stats.ttest_1samp.html) performs a two-sided test, so divide the p-value by 2 to get 1-sided p-value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from scipy import stats\n",
    "\n",
    "def analyze_alpha(expected_portfolio_returns_by_date):\n",
    "    \"\"\"\n",
    "    Perform a t-test with the null hypothesis being that the expected mean return is zero.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    expected_portfolio_returns_by_date : Pandas Series\n",
    "        Expected portfolio returns for each date\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    t_value\n",
    "        T-statistic from t-test\n",
    "    p_value\n",
    "        Corresponding p-value\n",
    "    \"\"\"\n",
    "    # TODO: Implement Function\n",
    "\n",
    "    return None\n",
    "\n",
    "project_tests.test_analyze_alpha(analyze_alpha)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data\n",
    "Let's see what values we get with our portfolio. After you run this, make sure to answer the question below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "t_value, p_value = analyze_alpha(expected_portfolio_returns_by_date)\n",
    "print(\"\"\"\n",
    "Alpha analysis:\n",
    " t-value:        {:.3f}\n",
    " p-value:        {:.6f}\n",
    "\"\"\".format(t_value, p_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question: What p-value did you observe? And what does that indicate about your signal?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*#TODO: Put Answer In this Cell*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Submission\n",
    "Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
